-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Typing fixes and cleanup in estimators.py
#1249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Setting `feature_preprocessors` Throws a ValueError. kwarg is `feature_preprocessor`
* Updated code samples to use ..code:: python
* Fixes failing test by setting random_state for make_x from sklearn.datsets
* Uses full path in `setup.py` instead of relative one
* Added contribution guide
* Fixes not checking for y_test being sparse
* Update example on extending data preprocessing with `no_preprocessing`
Update a link for `scenario` argument for the new SMAC documentation
* Fixes old reference to HPOlibConfigSpace to ConfigSpace* Fixes
|
Need to check if calling fit twice will work. Edit: Done, |
| 'mlp', | ||
| ] | ||
| self.include['classifier'] = include_estimators | ||
| self.automl_._include['classifier'] = include_estimators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to not building automl_ on fit, we have to set this manually as we do for the estimator.
| self.metric = log_loss | ||
|
|
||
| self.automl_._metric = self.metric | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here
| # Set the variables in the SklearnEstimator.automl | ||
| self.automl_._resampling_strategy = resampling_strategy | ||
| self.automl_._resampling_strategy_arguments = resampling_strategy_kwargs | ||
| self.automl_._get_smac_object_callback = smac_callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An here
Note: Built from PR Fix sparse y test #1248, should be considered after that.
Note2: Rebasing added some other PR's into this -_-
This PR fixes some typing issues that were present, notably the
self.automl_attribute is now created during__init__()rather than on the first call tofit, removing the_build_automl()function.This was primarily to get rid of the typing warning that
self.automl_.predict: None has no attribute predict(), as mypy can not establish thatself.automl_had been set in any methods.This PR also does some other small typing cleanups.
Union[SUPPORTED_TARGET_TYPES, spmatrix] -> SUPPORTED_TARGET_TYPESself.per_run_time_limitto be set at constructionAutoSklearnEstimatoras anABCAutoSklearnEstimatornow relies on subclasses having two@abstractmethodso that validation only has to occur once and not in each subclass._supported_target_types_automl_clspredict_probafromAutoSklearnEstimator, this should only be available in the subclassAutoSklearnClassifierAutoSklearnRegressor.fitso it just uses the base class fit method. It was doing nothing but forwarding arguments.AutoSklearn2Classifernow explicitly sets attributes on the already constructedself.automl_, where previously it relied on it being constructed duringfitMain Change
Extra Context
In general, as estimators are build in a two stage process, during
__init__and then remaining variables infit, we end up with many attributes set toNoneduring the__init__phase. This causes problems with mypy's typing as it can not garuntee that it will not beNonewhen the are used later.There are two solutions to this I can think of:
Move any attributes we can create to
__init__fromfit. In general this is probably better practice and safer.Move attributes to a
@propertyThis second solution works fine when we can use the combo
@property attributed_wantedwithself._attributed_wanted.It has the issue when we want
@property _attribute_wantedandself.__attributed_wanted.While we could have the reference as
__attributed_wanted, with the double leading underscore, this invokes pythons name scrambling and subclass will not be able to access this attribute directly and will be require to use the@property _attributed_wanted. This may or may not be a problem.